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Abstract 

We discuss two different even and convex non-negative smooth approximations of the 
absolute value function and apply them to construct MM algorithms for least absolute 
deviation regression. Both uniform and sharp quadratic majorizations are constructed. 

As an example we use the Boston housing data. In our example sharp quadratic 
majorization is typically 10-20 times as fast as uniform quadratic majorization. 
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Note: This is a working paper which will be expanded/updated frequently. All suggestions 
for improvement are welcome. The directory deleeuwpclx.net/pubfolclers/absapp has a pdf 
version, the bib hie, the complete Rmd hie with the code chunks, and the R source code. 

1 Introduction 

Various problems in computational statistics involve absolute values. We mention i\ regression, 
with the median as a special case, unidimensional scaling, support vector machines, and 
regression with t\ penalty terms. Using absolute values can create problems for optimization 
algorithms used in fitting, because the absolute value function is not differentiable at zero. 
Smooth parametric approximations to the absolute value function have been suggested by, 
among many others, Ramirez et al. (2014), Voronin, Ozkaya, and Yoshida (2015), and Bagul 
(2017). 

It seemed to be useful exercise to use some of these approximations to construct majorization 
(nowadays MM) algorithms for various statistical techniques. 


2 Using square root 

2.1 Function 

The simplest, and most commonly used, approximation to x is 

f e (x) = Vx 2 + e 2 , (1) 

where e is some, presumably small, positive number. It is computationally optimal in the 
somewhat convoluted sense explained by Ramirez et al. (2014). Plots for e 2 equal to 1, .5, .1, 
and .01 are in figure 1. 
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X 

Figure 1: Square root approximation: functipn 

Clearly f e (x) > |x| for all x, and the maximum of f e {x) — |x| is e, attained at zero. Also 

lim f e (x) — Ixl = lim f € (x) — Ixl = 0. 

x—>oo x—t—oo 

2.2 First Derivative 

The first derivative is 


fe( X ) = 


x 


yjx 2 + e 2 

This is plotted, for the same values of e, in figure 2. 


3 
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Figure 2: Square root approximation: first derivative 

We see that f[ is an increasing function of x, and thus f e is convex. 

2.3 Second Derivative 

The second derivative, plotted in figure 3, is 


/"(*) = 


\Jx 2 + e 2 x 2 + e 2 ’ 
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X 

Figure 3: Square root approximation: second derivative 

The second derivative is non-negative, with the horizontal axis as its asymptote. It attains 
its maximum, equal to e -1 , at zero. This bound on the second derivative is the main reason 
for using e 2 in the definition of f e . 

3 Using convolution 

3.1 Function 

The following convolution approximation to x was suggested by Voronin, Ozkaya, and 
Yoshida (2015). 


9e(x) 



X 


y\ exp{ 



Carrying out the integration gives 

9e(x) = X ^2$ Q - 1 j + 2e0 Q . 


( 2 ) 


Because g e is a weighted sum (integral) of a set of convex functions it is convex. We have 
g t (x) > |x|, and the maximum of g e (x) — |x| is equal to e y^, attained at zero. 

Voronin, Ozkaya, and Yoshida (2015) show that as e —» 0 we have both uniform and L 1 
convergence of g e to |x|. Of course other convolution approximation with similar scale families 
(a.k.a. mollifiers ), which have support converging to zero while maintaining mass equal to 
one, if scale goes to zero, can also be used. 
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Figure 4 show the approximation for e equal to 1, .5, .1. and .01. 



X 


Figure 4: Convolution approximation: function 

3.2 First derivative 

The first derivative, in figure 5 is 


g'e 0 ) = 2$ Q -1. 


6 




X 

Figure 5: Convolution approximation: first derivative 

3.3 Second derivative 

The second derivative, in figure 6, is 


£{?) = 




X 

Figure 6: Convolution approximation: second derivative 
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The second derivative is positive and unimodal, with a maximum equal to 
zero. 



attained at 


4 Applications to Regression 

Consider the problem of minimizing 


<?{P)='52wi\yi-rf i P\ ( 3 ) 

2—1 

This problem has been studied in statistics for many, many years. See, for example, Dielman 
(2005) or Farebrother (2013). Our contribution to the computational aspects of the problem 
is modest, because we are mainly interested in its connection to majorization theory. 


4.1 AM/GM approach 


Let’s start with the square root approximation f e . The first majorization we construct is 
based on the AM/GM inequality. Because for each /3,/3 we have 

\/{to - t/3) 2 + - AX 2 + e 2 } < \{(Vi - AHf + e 2 } + {(» - x[p) 2 + e 2 }, 

it follows that 


n 1 n 1 

V - x[p) < - ,- - 

2 .=1 J(y , - xW 2 + 6 ^ 


2=1 


{{Vi ~ x iP) + (Vi ~ x 'iP) + 2e } 


Thus if V(/3) is the diagonal matrix with diagonal elements 

Vii(Ji) = 


V \yi - x 'iPY + e 2 

then majorization leads to an iteratively reweighted least squares algorithm. The update 
formula is 

p(k+i) = ^x'WV{^ k) )X)- 1 X'WV(^ k) )y. 


We apply this to the Boston housing data from the UCI repository (Newman et al. (1998)), 
loaded from the mlbench package (Leisch and Dimitriadou (2010)). 

data( "BostonHousing") 
y <- BostonHousing$medv 
z <- BostonHousing[, 1:13] 
x <- cbind (1, apply (z, 2, as.numeric)) 
w <- rep (1, length(y)) 


The R function llfn2() is used. 







hi <- llfn2 (x, y, aeps = le-2, itmax = 10000) 

We have convergence of function values, to a change of less than 10^{-10}, in 530 iterations. 
The output below gives both cr e , the approximating loss function that we minimize, and cr 
from (3), the actual l\ loss function that we approximate. 

## eps: 1.0e-03 itel: 530 sigma_e: 1559.812228 sigma: 1559.709732 
## beta: 

## 14.633179 -0.144086 0.036871 0.019540 1.278130 -8.961015 5.324724 
## -0.030748 -1.035830 0.183490 -0.010219 -0.728994 0.011279 -0.300423 

4.2 Uniform quadratic majorization 

The second majorization uses an upper bound for the second derivative, as in Voss and 
Eckhardt (1980), Bohning and Lindsay (1988), Lange (2016), p. 100, or De Leeuw (2016), 
chapter 11. It can be applied both to our square root f e and convolution g e approximations. 
First the square root. Because 


M x ) < My) + e2 {x -y) + - y ) 2 

we have 


J2 w iM]Ji-x l P) < J2 w iMyi~ x iP)-J2 w i 


y% - x 'iP 


i=l 


2=1 


2=1 


(yi - x 'iP) 2 + e 2 . 


] n 

ze i =i 


which gives the majorization algorithm 

/3 (fe+1) = (X'WX)- l X'W{X^ k) + e F e '(/3 W )), 

where F' f {[3) is a vector with elements 

rsf, / m yi~ x 'iP 
vUyi - x iP) = / ^=- 

\]Xvi - x 'iP) 2 + e 2 

The corresponding expression for the convolution approximation g e is 

'j2wige{yi-x' i P) < |2 $(—— X ^-) - l] x' i {f3-P)+ 

i= 1 i= 1 i=l { € J 

The majorization algorithm is 


2 e V tt . =1 


p{k+i) = (x’wxMx'wixpw + e ^lG' € M k) )) 

Given the shape of the second derivatives of the approximating functions, we expect this 
algorithm to converge slowly. The uniform upper bound is only good very close to the origin, 
and horribly bad everywhere else. The update function only makes a very small move in 
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the descent direction. We illustrate the slow convergence with two analyses of the Boston 
housing data, one using the square root and one using the convolution approximation, using 
the R routines llfu2() and llgu2(). 

For the f e approximation we find 

h2 <- llfu2 (x, y, aeps = le-2, itmax= 100000) 

## eps: 1.0e-03 itel: 31791 sigma_e: 1559.812229 sigma: 1559.709719 
## beta: 

## 14.636109 -0.144089 0.036873 0.019553 1.278381 -8.963910 5.324655 
## -0.030749 -1.035922 0.183485 -0.010218 -0.729064 0.011278 -0.300400 

We see that convergence to our default precision requires 31791 iterations. The results for 
the convolution approximation are similar. 

h3 <- llgu2 (x, y, aeps = le-2, itmax= 100000) 

## eps: 1.0e-02 itel: 16849 sigma_e: 1559.744234 sigma: 1559.708984 
## beta: 

## 14.780372 -0.144247 0.037000 0.020306 1.292046 -9.124744 5.323192 
## -0.030801 -1.041139 0.183103 -0.010149 -0.732826 0.011262 -0.299007 


4.3 Sharp quadratic majorization 

Uniform quadratic majorization fails rather miserably. But we have to realize that the 
AM/GM approach also leads to quadratic majorization, and in that case it works quite well. 
In order to improve the speed of convergence of the quadratic majorizations we turn to sharp 
non-uniform quadratic approximation, discussed in De Leeuw and Lange (2009). We use 
their theorem 4.5. 


Theorem 4.5. Suppose f(x) is an even , differentiable function on M such that the ratio 
f'{x)/x is decreasing on (0, oo). Then the even quadratic 

9(x) = ^^~(x 2 - y 2 ) + / (y) 

2 y 

is the best quadratic majorizer of f{x) at the point y. 

Both approximations f e and g e satisfy the conditions of the theorem. For f £ we find 

v^T? < 1 v /J T? (* 2 - y 2 ) + A 2 + ^ 

Or 


J2 W ife(Vi - XiP) < W ife(Vi - X'ifi) + ^E 


i— 1 


i =1 


2 *=i \/{Vi - x'ifiy + e 2 


{(y i - X ' i PY-(y i - X ' i PY}. 


But this is exactly the majorization provided by the AM/GM approach. Consequently the 
AM/GM approach provides us with the sharpest quadratic majorization of f € . We see that in 
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the Boston housing example sharp quadratic majorization reduces the number of iterations 
from 31791 for uniform quadratic majorization to 530. 

The relation between AM/GM and sharp quadratic majorization is both new and interesting, 
but it does not give us an alternative algorithm. We can just use AM/GM and the R function 
llfl(). For the convolution approximation g e , however, the result in the theorem gives a new 
non-uniform quadratic majorization. From theorem 4.5 

9e{x) < g e {y) +- f- - [x - y ), 


or 


n n ~ 1 n 2<F( 

J2 W i9e(Vi ~ x 'iP) < J2 Wige(Vi ~ x iP) + X J2 W * 


Vi-x'iPy 


2=1 


2=1 


2=1 


Vi ~ x 'iP 


-{{vis-m 2 -ivi-*$?}■ 


We analyze the data with the R routine llgn2(). 
h4 <- llgn2 (x, y, aeps = le-2, itmax= 1000) 

## eps: 1.0e-02 itel: 334 sigma_e: 1559.744234 sigma: 1559.708994 
## beta: 

## 14.778537 -0.144244 0.036996 0.020294 1.291961 -9.123603 5.323291 
## -0.030799 -1.041089 0.183106 -0.010150 -0.732776 0.011262 -0.299028 

For uniform quadratic majorization we needed 16849 iterations, for sharp majorization we 
only need 334. 


4.4 Rate of Convergence 

Suppose g is a majorization scheme for r, i.e. g : W 1 (E> M n —> M and r : M n —» M such that 
v( x iV) — T ( x ) f° r x iU and g(x,x) = t(x) for all x. Also assume sufficient differentiability 
and uniqueness of the minima in majorization iterations. De Leeuw (1994) (also see De 
Leeuw (2016)) shows that the convergence rate of the majorization algorithm is the largest 
eigenvalue of the matrix 

/ - {V ll g^x,x)Y 1 V 2 r{x) = ~{T> 11 g(x,x)}~ 1 T> 12 g(x,x). 

evaulated at the fixed point x. This is the Jacobian of the update function, that maps x^ to 
its successor x^ k+l \ 

Note that in our majorization example 77 is a quadratic in x. Thus the second derivatives T > L1 
do not depend on x and are simply the matrix of the quadratic form. 

If o{ (/3) = E/=i wJeiVi ~ X Y) then 

n _ 3 

V 2 a{(0) = e 2 Y j w i {(;yi - x'Y) 2 + e 2 } 2 x^\ 

2=1 
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If erf (/5) = E?=i WiQeiVi ~ x'iP) then 


V^iW = -±wM- 

6 i =1 


AP 


) X iA 


The function computeThatRate() gives the theoretical convergence rates at the t\ solution 
for the Boston housing data for various values of e and for all four quadratic majorizations 
(uniform-f, uniform-g, non-uniform-F, and non-uniform-g). 

epss <- c (5, 2, 1, .5, .1, .01, .005) 
beta <- hl$beta 

computeThatRate (x, y, w, beta, epss) 


## [1] 5.0000000000 
## [ 1 ] 2.0000000000 
## [ 1 ] 1.0000000000 
## [1] 0.5000000000 
## [ 1 ] 0.1000000000 
## [ 1 ] 0.0100000000 
## [1] 0.0050000000 


0.6576324826 0.5723976134 0.4279092359 0.3873481792 
0.8225953967 0.7835762713 0.5415263594 0.5286119029 
0.8963063025 0.8710636781 0.6125974817 0.5956478849 
0.9444461085 0.9288920175 0.7051830266 0.6940155591 
0.9889511967 0.9881274205 0.8800781159 0.9004846261 
0.9998468914 0.9999785172 0.9872466313 0.9985891937 
0.9999785958 0.9999999998 0.9964372125 0.9999999747 


From the output we see that the convergence rate tends to one if e j, 0. Thus asymptotically 
our quadratic majorization algorithms will have a sublinear convergence rate. 

As we have already seen in our example, the non-uniform methods converge much faster than 
the uniform ones. Throughout this report we have chosen e as some conventional numbers, 
without taking into account the scale of the data or the closeness of the fit. This is probably 
not a good idea, and especially in this large example our conventional e’s could very well be 
too small. 


We check this more closely by using non-uniform majorization for seven different values of e. 
The results are below. Epsilon is in the first column, followed by the results for the number 
of iterations, the value of the approximation function at the minimum, and the value of the 
least abosolute deviation function itself. 


## 

[1] 

5.000000 

15.000000 

3212.724906 

## 

[1] 

2.000000 

25.000000 

2027.412038 

## 

[1] 

1.000000 

32.000000 

1725.433167 

## 

[1] 

0.500000 

35.000000 

1615.242147 

## 

[1] 

0.100000 

89.000000 

1563.678895 

## 

[1] 

0.050000 

131.000000 

1561.000966 

## 

[1] 

0.010000 

530.000000 

1559.812228 


1580.815597 

1565.003116 

1561.816194 

1560.740384 

1559.955323 

1559.831086 

1559.709732 


For larger values of e the values of <j((3) and cr{((3) are very different. But the value of a(f3) 
is already quite close to its minimum value, and that is of course what we are interested in. 
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This is illustrated also in figure 7, where the 7 solutions for [3 are plotted. Only the first one, 
for e equal to 5, is any different, the rest are virtually indistinguishable. Thus if you want 10 
decimals precision in your regression analysis then by all means choose e small. But seeing a 
regression analysis with 10 decimals precision tell us more about the data analyst than it 
tells us about the data. 



predictor 

Figure 7: Square root approximation: different epsilon 


For completeness we repeat the analysis for the convolution approximation. The results are 
very much the same. 


## 

[ 1 ] 

## 

[ 1 ] 

## 

[ 1 ] 

## 

[ 1 ] 

## 

[ 1 ] 

## 

[ 1 ] 

## 

[ 1 ] 


5.000000 

2.000000 

1.000000 

0.500000 

0.100000 

0.050000 

0.010000 


13.000000 

22.000000 

27.000000 

32.000000 

110.000000 

123.000000 

335.000000 


2660.097037 

1815.279469 

1634.626462 

1580.827133 

1560.955511 

1560.122239 

1559.744234 


1589.022400 

1565.920065 

1561.803819 

1560.899992 

1560.006532 

1559.837335 

1559.708994 
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predictor 

Figure 8: Convolution approximation: different epsilon 


5 Discussion 

In future reports we may consider applications to nonlinear li, to the lasso, to support vector 
machines, and to unidimensional scaling. 

6 Code 

6.1 llfu2.R 

llfu2 <- 

function (x, 

y. 

w = rep(l , length (y)), 
bold = NULL, 
aeps = le-3, 
eps = le-10, 
itmax = 1000, 
verbose = FALSE) { 
ffunc <- function (x, y, b, aeps) { 
z <- y - drop(x 0 /o*°/ 0 b) 
return (sqrt (z ~ 2 + aeps 2)) 

> 

fgrad <- function (x, y, b, aeps) { 
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z <- y - drop(x y o *°/ 0 b) 

return (z / sqrt(z 2 + aeps 2)) 

> 

if (is.null(bold)) { 

bold <- lm.wfit (x, y, w) $coeff icients 

> 

zold <- ffunc (x, y, bold, aeps) 

fold <- sum (w * zold) 

void <- fgrad (x, y, bold, aeps) 

itel <- 1 

repeat { 

bnew <- lm.wfit (x, drop(x °/ 0 *°/ 0 bold) + aeps * void, w) $coeff icients 
znew <- ffunc (x, y, bnew, aeps) 
fnew <- sum (w * znew) 
vnew <- fgrad (x, y, bnew, aeps) 
if (verbose) 
cat ( 

"Iteration: ", 

f ormatC(itel , width = 3, format = "d"), 

"old: ", 
formate ( 
fold, 

digits = 8, 
width = 12, 
format = "f" 

), 

"fnew: ", 
formate ( 
fnew, 

digits = 8, 
width = 12, 
format = "f" 

), 

"\n" 

) 

if (((fold - fnew) < eps) |I (itel == itmax)) 
break 

void <- vnew 
fold <- fnew 
bold <- bnew 
itel <- itel + 1 

} 

ff <- sum (w * abs (y - drop (x °/ 0 * 0 /o bnew))) 
return(list ( 
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itel = itel, 
f = fnew, 
ff = ff, 
beta = bnew 

)) 

} 

6.2 llfn2.R 

llfn2 <- 

function (x, 

y> 

w = rep(l , length (y)), 
bold = NULL, 
aeps = le-3, 
eps = le-10, 
itmax = 1000, 
verbose = FALSE) { 
if (is.null(bold)) { 

bold <- lm.wfit (x, y, w) $coeff icients 

> 

zold <- sqrt ((y — drop (x 7*7 bold)) 2 + aeps ~ 2) 

fold <- sum (w * zold) 

void <- w / zold 

itel <- 1 

repeat { 

bnew <- lm.wfit (x, y, void) $coeff icients 
znew <- sqrt ((y — drop (x 7*7 bnew)) 2 + aeps ~ 2) 
fnew <- sum (w * znew) 
vnew <- w / znew 
if (verbose) 
cat ( 

"Iteration: ", 

f ormatC(itel , width = 3, format = "d"), 

"old: ", 
formate ( 
fold, 

digits = 8, 
width = 12, 
format = "f" 

), 

"fnew: ", 
formate ( 
fnew, 


16 


digits = 8, 
width = 12, 
format = "f" 

), 

"\n" 

) 

if (((fold - fnew) < eps) || (itel == itmax)) 
break 

void <- vnew 
fold <- fnew 
itel <- itel + 1 

> 

ff <- sum (w * abs (y - drop (x °/ 0 *% bnew))) 
return(list ( 
itel = itel, 
f = fnew, 
ff = ff, 
beta = bnew 

)) 


6.3 llgu2.R 

llgu2 <- 

function (x, 

y> 

w = rep(l , length (y)), 
bold = NULL, 
aeps = le-3, 
eps = le-10, 
itmax = 1000, 
verbose = FALSE) { 
cfunc <- function (x, y, b, aeps) { 
z <- y - drop (x %*% b) 

return (z * (2 * pnorm(z / aeps) - 1) + 2 * aeps * dnorm(z / aeps)) 

} 

cgrad <- function (x, y, b, aeps) { 
z <- y - drop(x b) 
return (2 * pnorm(z / aeps) - 1) 

> 

bnd <- aeps * sqrt (pi / 2) 
if (is.null(bold)) { 

bold <- lm.wfit (x, y, w) $coeff icients 

> 
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zold <- cfunc (x, y, bold, aeps) 

fold <- sum (w * zold) 

void <- cgrad (x, y, bold, aeps) 

itel <- 1 

repeat { 

bnew <- lm.wfit (x, drop(x 0 / 0 *°/ 0 bold) + bnd * void, w) $coef ficients 
znew <- cfunc (x, y, bnew, aeps) 
fnew <- sum (w * znew) 
vnew <- cgrad (x, y, bnew, aeps) 
if (verbose) 
cat ( 

"Iteration: ", 

f ormatC(itel , width = 3, format = "d"), 

"old: ", 
formate ( 
fold, 

digits = 8, 
width = 12, 
format = "f" 

), 

"fnew: ", 
formate ( 
fnew, 

digits = 8, 
width = 12, 
format = "f" 

), 

"\n" 

) 

if (((fold - fnew) < eps) || (itel == itmax)) 
break 

void <- vnew 
fold <- fnew 
bold <- bnew 
itel <- itel + 1 

} 

ff <- sum (w * abs (y - drop (x °/ 0 *°/o bnew))) 
return(list ( 
itel = itel, 
f = fnew, 
ff = ff, 
beta = bnew 

)) 
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6.4 llgn2.R 


llgn2 <- 

function (x, 

y> 

w = rep(l , length (y)), 
bold = NULL, 
aeps = le-3, 
eps = le-10, 
itmax = 1000, 
verbose = FALSE) { 
cfunc <- function (x, y, b, aeps) { 
z <- y - drop (x b) 

return (z * (2 * pnorm(z / aeps) - 1) + 2 * aeps * dnorm(z / aeps)) 

} 

cgrad <- function (x, y, b, aeps) { 
z <- y - drop(x b) 
return (2 * pnorm(z / aeps) - 1) 

> 

if (is.null(bold)) { 

bold <- lm.wfit (x, y, w) $coeff icients 

> 

zold <- cfunc (x, y, bold, aeps) 
fold <- sum (w * zold) 

void <- cgrad (x, y, bold, aeps) / (y - drop(x bold)) 
itel <- 1 
repeat { 

bnew <- lm.wfit (x, y, w * void) $coeff icients 
znew <- cfunc (x, y, bnew, aeps) 
fnew <- sum (w * znew) 

vnew <- cgrad (x, y, bnew, aeps) / (y - drop(x bnew)) 
if (verbose) 
cat ( 

"Iteration: ", 

f ormatC(itel , width = 3, format = "d"), 

"old: ", 
formate ( 
fold, 

digits = 8, 
width = 12, 
format = "f" 

), 

"fnew: ", 
formate ( 
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fnew, 

digits = 8, 
width = 12, 
format = "f" 

), 

"\n" 

) 

if (((fold - fnew) < eps) II (itel == itmax)) 
break 

void <- vnew 
fold <- fnew 
bold <- bnew 
itel <- itel + 1 

> 

ff <- sum (w * abs (y - drop (x °/ 0 *7o bnew))) 
return(list ( 
itel = itel, 
f = fnew, 
ff = ff, 
beta = bnew 

)) 


6.5 computeThatRate.R 

computeThatRate <- function (x, y, w, beta, epss) { 
for (eps in epss) { 

r <- y - drop (x °/,*°/, beta) 
v <- 1 / ((r ~ 2 + eps ~ 2) ~ (3/2)) 
d2_tau_f <- crossprod (x, v * w * x) * (eps * 2) 
v <- dnorm (r / eps) 

d2_tau_g <- (2 / eps) * crossprod (x, v * w * x) 

d2_eta_f_uq <- crossprod (x, w * x) / eps 

d2_eta_g_uq <- sqrt (2 / pi) * crossprod (x, w * x) / eps 

v <- 1 / sqrt (r ~ 2 + eps ~ 2) 

d2_eta_f_nq <- crossprod (x, w * v * x) 

v <- (2 * pnorm (r / eps) - 1) / r 

d2_eta_g_nq <- crossprod (x, w * v * x) 

r_f_uq <- 

1 - min (eigen (solve (d2_eta_f_uq, d2_tau_f))$values) 

r _g_ u q <- 

1 - min (eigen (solve (d2_eta_g_uq, d2_tau_g))$values) 
r_f_nq <- 

1 - min (eigen (solve (d2_eta_f_nq, d2_tau_f))lvalues) 
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r_g_nq <- 

1 - min (eigen (solve (d2_eta_g_nq, d2_tau_g))$values) 
print (c(eps, r_f_uq, r_g_uq, r_f_nq, r_g_nq)) 

} 

} 
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